% -*- Mode: latex; -*-
\documentclass[dvipdfm,11pt]{article}
\usepackage[dvipdfm]{hyperref} % Upgraded url package
\parskip=.1in

% Formatting conventions for contributors
% 
% A quoting mechanism is needed to set off things like file names, command
% names, code fragments, and other strings that would confuse the flow of
% text if left undistinguished from preceding and following text.  In this
% document we use the LaTeX macro '\texttt' to indicate such text in the
% source, which normally produces, when used as in '\texttt{special text}',
% the typewriter font.

% It is particularly easy to use this convention if one is using emacs as
% the editor and LaTeX mode within emacs for editing LaTeX documents.  In
% such a case the key sequence ^C^F^T (hold down the control key and type
% 'cft') produces '\texttt{}' with the cursor positioned between the
% braces, ready for the special text to be typed.  The closing brace can
% be skipped over by typing ^e (go to the end of the line) if entering
% text or ^C-} to just move the cursor past the brace.
%
% Please add index entries for important terms and keywords, including
% environment variables that may control the behavior of MPI or one of the
% tools and concepts such as line labeling from mpiexec.

% LaTeX mode is usually loaded automatically.  At Argonne, one way to 
% get several useful emacs tools working for you automatically is to put
% the following in your .emacs file.

% (require 'tex-site)
% (setq LaTeX-mode-hook '(lambda ()
%                        (auto-fill-mode 1)
%                        (flyspell-mode 1)
%                        (reftex-mode 1)
%                        (setq TeX-command "latex")))
   
\begin{document}
\markright{MPICH2 User's Guide}
\title{\textbf{MPICH2 User's Guide}\thanks{This work was supported by the Mathematical,
    Information, and Computational Sciences Division subprogram of the
    Office of Advanced Scientific Computing Research, SciDAC Program,
    Office of Science, U.S. Department of Energy, under Contract
    DE-AC02-06CH11357.}\\
Version %MPICH2_VERSION%\\
Mathematics and Computer Science Division\\
Argonne National Laboratory}

\author{
Pavan Balaji\\
Darius Buntinas\\
Ralph Butler\\
Anthony Chan\\
David Goodell\\
William Gropp\\
Jayesh Krishna\\
Rob Latham\\
Ewing Lusk\\
Guillaume Mercier\\
Rob Ross\\
Rajeev Thakur\\[2.0ex]
\textbf{Past Contributors:}\\
David Ashton\\
Brian Toonen
}

\maketitle

\cleardoublepage

\pagenumbering{roman}
\tableofcontents
\clearpage

\pagenumbering{arabic}
\pagestyle{headings}


\section{Introduction}
\label{sec:introduction}

This manual assumes that MPICH2 has already been installed.  For
instructions on how to install MPICH2, see the \emph{MPICH2
  Installer's Guide}, or the \texttt{README} in the top-level MPICH2
directory.  This manual explains how to compile, link, and run MPI
applications, and use certain tools that come with MPICH2.  This is a
preliminary version and some sections are not complete yet.  However,
there should be enough here to get you started with MPICH2.


\section{Getting Started with MPICH2}
\label{sec:migrating}

MPICH2 is a high-performance and widely portable implementation of the
MPI Standard, designed to implement all of MPI-1 and MPI-2 (including
dynamic process management, one-sided operations, parallel I/O, and
other extensions). The \emph{MPICH2 Installer's Guide} provides some
information on MPICH2 with respect to configuring and installing
it. Details on compiling, linking, and running MPI programs are
described below.


\subsection{Default Runtime Environment}
\label{sec:default-environment}

MPICH2 provides a separation of process management and communication.
The default runtime environment in MPICH2 is called Hydra. Other
process managers are also available.

\subsection{Starting Parallel Jobs}
\label{sec:startup}

MPICH2 implements \texttt{mpiexec} and all of its standard arguments,
together with some extensions. See Section~\ref{sec:mpiexec-standard}
for standard arguments to \texttt{mpiexec} and various subsections of
Section~\ref{sec:mpiexec} for extensions particular to various process
management systems.


\subsection{Command-Line Arguments in Fortran}
\label{sec:fortran-command-line}

MPICH1 (more precisely MPICH1's \texttt{mpirun}) required access to
command line arguments in all application programs, including Fortran
ones, and MPICH1's \texttt{configure} devoted some effort to finding
the libraries that contained the right versions of \texttt{iargc} and
\texttt{getarg} and including those libraries with which the
\texttt{mpif77} script linked MPI programs.  Since MPICH2 does not
require access to command line arguments to applications, these
functions are optional, and \texttt{configure} does nothing special
with them.  If you need them in your applications, you will have to
ensure that they are available in the Fortran environment you are
using.

\section{Quick Start}
\label{sec:quickstart}

To use MPICH2, you will have to know the directory where MPICH2 has
been installed.  (Either you installed it there yourself, or your
systems administrator has installed it.  One place to look in this
case might be \texttt{/usr/local}.  If MPICH2 has not yet been
installed, see the \emph{MPICH2 Installer's Guide}.)  We suggest that
you put the \texttt{bin} subdirectory of that directory into your
path.  This will give you access to assorted MPICH2 commands to
compile, link, and run your programs conveniently.  Other commands in
this directory manage parts of the run-time environment and execute
tools.

One of the first commands you might run is \texttt{mpich2version} to
find out the exact version and configuration of MPICH2 you are working
with. Some of the material in this manual depends on just what version
of MPICH2 you are using and how it was configured at installation
time.

You should now be able to run an MPI program.  Let us assume that the
directory where MPICH2 has been installed is
\texttt{/home/you/mpich2-installed}, and that you have added that directory to
your path, using 
\begin{verbatim}
    setenv PATH /home/you/mpich2-installed/bin:$PATH
\end{verbatim}
for \texttt{tcsh} and \texttt{csh}, or 
\begin{verbatim}
    export PATH=/home/you/mpich2-installed/bin:$PATH
\end{verbatim}
for \texttt{bash} or \texttt{sh}.
Then to run an MPI program, albeit only on one machine, you can do:
\begin{verbatim}
    cd  /home/you/mpich2-installed/examples
    mpiexec -n 3 ./cpi
\end{verbatim}

Details for these commands are provided below, but if you can
successfully execute them here, then you have a correctly installed
MPICH2 and have run an MPI program. 

\section{Compiling and Linking}
\label{sec:compiling}

A convenient way to compile and link your program is by using scripts
that use the same compiler that MPICH2 was built with.  These are
\texttt{mpicc}, \texttt{mpicxx}, \texttt{mpif77}, and \texttt{mpif90},
for C, C++, Fortran 77, and Fortran 90 programs, respectively.  If any
of these commands are missing, it means that MPICH2 was configured
without support for that particular language.

%% Pavan Balaji (12/27/2009): I'm commenting out this part as this is
%% broken in the current MPICH2 stack (see ticket #502).

%% \subsection{Specifying Compilers}
%% \label{sec:specifying-compilers}

%% You need not use the same compiler that MPICH2 was built with, but not
%% all compilers are compatible.  You can also specify the compiler for
%% building MPICH2 itself, as reported by \texttt{mpich2version}, just by
%% using the compiling and linking commands from the previous section.
%% The environment variables \texttt{MPICH_CC}, \texttt{MPICH_CXX},
%% \texttt{MPICH_F77}, and \texttt{MPICH_F90} may be used to specify
%% alternate C, C++, Fortran 77, and Fortran 90 compilers, respectively.


\subsection{Special Issues for C++}
\label{sec:cxx}

Some users may get error messages such as
\begin{small}
\begin{verbatim}
    SEEK_SET is #defined but must not be for the C++ binding of MPI
\end{verbatim}
\end{small}
The problem is that both \texttt{stdio.h} and the MPI C++ interface use
\texttt{SEEK\_SET}, \texttt{SEEK\_CUR}, and \texttt{SEEK\_END}.  This is really a bug
in the MPI-2 standard.  You can try adding 
\begin{verbatim}
    #undef SEEK_SET
    #undef SEEK_END
    #undef SEEK_CUR
\end{verbatim}
before \texttt{mpi.h} is included, or add the definition
\begin{verbatim}
    -DMPICH_IGNORE_CXX_SEEK
\end{verbatim}
to the command line (this will cause the MPI versions of \texttt{SEEK\_SET}
etc. to be skipped).

\subsection{Special Issues for Fortran}
\label{sec:fortran}

MPICH2 provides two kinds of support for Fortran programs.  For
Fortran 77 programmers, the file \texttt{mpif.h} provides the
definitions of the MPI constants such as \texttt{MPI\_COMM\_WORLD}.
Fortran 90 programmers should use the \texttt{MPI} module instead;
this provides all of the definitions as well as interface definitions
for many of the MPI functions.  However, this MPI module does not
provide full Fortran 90 support; in particular, interfaces for the
routines, such as \texttt{MPI\_Send}, that take ``choice'' arguments
are not provided.


\section{Running Programs with \texttt{mpiexec}}
\label{sec:mpiexec}

The MPI-2 Standard describes \texttt{mpiexec} as a suggested way to
run MPI programs. MPICH2 implements the \texttt{mpiexec} standard, and
also provides some extensions.

\subsection{Standard \texttt{mpiexec}}
\label{sec:mpiexec-standard}

Here we describe the standard \texttt{mpiexec} arguments from the MPI-2
Standard~\cite{mpi-forum:mpi2-journal}.  The simplest form of a command
to start an MPI job is 

\begin{verbatim}
   mpiexec -f machinefile -n 32 a.out
\end{verbatim}

to start the executable \texttt{a.out} with 32 processes (providing an
\texttt{MPI\_COMM\_WORLD} of size 32 inside the MPI application).
Other options are supported, for search paths for executables, working
directories, and even a more general way of specifying a number of
processes.  Multiple sets of processes can be run with different
executables and different values for their arguments, with
``\texttt{:}'' separating the sets of processes, as in:

\begin{verbatim}
   mpiexec -f machinefile -n 1 ./master : -n 32 ./slave
\end{verbatim}

It is also possible to start a one process MPI job (with a
\texttt{MPI\_COMM\_WORLD} whose size is equal to 1), without using
\texttt{mpiexec}.  This process will become an MPI process when it
calls \texttt{MPI\_Init}, and it may then call other MPI functions.
Currently, MPICH2 does not fully support calling the dynamic process
routines from MPI-2 (e.g., \texttt{MPI\_Comm\_spawn} or
\texttt{MPI\_Comm\_accept}) from processes that are not started with
\texttt{mpiexec}.


\subsection{Extensions for All Process Management Environments}
\label{sec:extensions-uniform}

Some \texttt{mpiexec} arguments are specific to particular
communication subsystems (``devices'') or process management
environments (``process managers'').  Our intention is to make all
arguments as uniform as possible across devices and process managers.
For the time being we will document these separately.

\subsection{\texttt{mpiexec} Extensions for the Hydra Process Manager}

MPICH2 provides a number of process management systems. Hydra is the
default process manager in MPICH2. More details on Hydra and its
extensions to mpiexec can be found at
\url{http://wiki.mcs.anl.gov/mpich2/index.php/Using\_the\_Hydra\_Process\_Manager}


\subsection{Extensions for SMPD Process Management Environment}
\label{sec:extensions-smpd}

SMPD is an alternate process manager that runs on both Unix and Windows.
It can launch jobs across both platforms if the binary formats match 
(big/little endianness and size of C types-- \texttt{int},
\texttt{long}, \texttt{void*}, etc).


\subsubsection{\texttt{mpiexec} arguments for SMPD}
\label{sec:mpiexec-smpd}

\texttt{mpiexec} for smpd accepts the standard MPI-2 \texttt{mpiexec}
options.  Execute
\begin{verbatim}
    mpiexec
\end{verbatim}
or
\begin{verbatim}
    mpiexec -help2
\end{verbatim}
to print the usage options.  Typical usage:
\begin{verbatim}
    mpiexec -n 10 myapp.exe
\end{verbatim}
All options to \texttt{mpiexec}:
\begin{description}
\item[\texttt{-n x}]
\item[\texttt{-np x}]\mbox{}\\
  launch \texttt{x} processes
\item[\texttt{-localonly x}]
\item[\texttt{-np x -localonly}]\mbox{}\\
  launch \texttt{x} processes on the local machine
\item[\texttt{-machinefile filename}]\mbox{}\\
  use a file to list the names of machines to launch on
\item[\texttt{-host hostname}]\mbox{}\\
  launch on the specified host.
\item[\texttt{-hosts n host1 host2 ... hostn}]
\item[\texttt{-hosts n host1 m1 host2 m2 ... hostn mn}]\mbox{}\\
  launch on the specified hosts.
  In the second version the number of processes = m1 + m2 + ... + mn
\item[\texttt{-dir drive:$\backslash$my$\backslash$working$\backslash$directory}]
\item[\texttt{-wdir /my/working/directory}]\mbox{}\\
  launch processes with the specified working directory. (\texttt{-dir}
  and \texttt{-wdir} are equivalent)
\item[\texttt{-env var val}]\mbox{}\\
  set environment variable before launching the processes
%\item[\texttt{-nocolor}]\mbox{}\\
%  don't use process specific output coloring
%\item[\texttt{-nompi}]\mbox{}\\
%  launch processes without the mpi startup mechanism
\item[\texttt{-exitcodes}]\mbox{}\\
  print the process exit codes when each process exits.
\item[\texttt{-noprompt}]\mbox{}\\
  prevent \texttt{mpiexec} from prompting for user credentials.  Instead errors will
be printed and \texttt{mpiexec} will exit.
\item[\texttt{-localroot}]\mbox{}\\
  launch the root process directly from mpiexec if the host is local.
  (This allows the root process to create windows and be debugged.)
\item[\texttt{-port port}]
\item[\texttt{-p port}]\mbox{}\\
  specify the port that \texttt{smpd} is listening on.
\item[\texttt{-phrase passphrase}]\mbox{}\\
  specify the passphrase to authenticate connections to \texttt{smpd} with.
\item[\texttt{-smpdfile filename}]\mbox{}\\
  specify the file where the \texttt{smpd} options are stored including the 
passphrase. (unix only option)
%\item[\texttt{-soft Fortran90\_triple}]\mbox{}\\
%  acceptable number of processes to launch up to maxprocs
\item[\texttt{-path search\_path}]\mbox{}\\
  search path for executable, ; separated
% This will probably not be implemented.
%\item[\texttt{-arch architecture}]\mbox{}\\
%  sun, linux, rs6000, ...
\item[\texttt{-timeout seconds}]\mbox{}\\
  timeout for the job. 
\end{description}
Windows specific options:
\begin{description}
\item[\texttt{-map drive:$\backslash\backslash$host$\backslash$share}]\mbox{}\\
  map a drive on all the nodes
  this mapping will be removed when the processes exit
\item[\texttt{-logon}]\mbox{}\\
  prompt for user account and password
\item[\texttt{-pwdfile filename}]\mbox{}\\
  read the account and password from the file specified.

  put the account on the first line and the password on the second
%\item[\texttt{-nomapping}]\mbox{}\\
%  don't try to map the current directory on the remote nodes
\item[\texttt{-nopopup\_debug}]\mbox{}\\
  disable the system popup dialog if the process crashes
%\item[\texttt{-dbg}]\mbox{}\\
%  catch unhandled exceptions
\item[\texttt{-priority class[:level]}]\mbox{}\\
  set the process startup priority class and optionally level.\mbox{}\\
  class = 0,1,2,3,4   = idle, below, normal, above, high\mbox{}\\
  level = 0,1,2,3,4,5 = idle, lowest, below, normal, above, highest\mbox{}\\
  the default is -priority 2:3
\item[\texttt{-register}]\mbox{}\\
  encrypt a user name and password to the Windows registry.
\item[\texttt{-remove}]\mbox{}\\
  delete the encrypted credentials from the Windows registry.
\item[\texttt{-validate [-host hostname]}]\mbox{}\\
  validate the encrypted credentials for the current or specified host.
\item[\texttt{-delegate}]\mbox{}\\
  use passwordless delegation to launch processes.
\item[\texttt{-impersonate}]\mbox{}\\
  use passwordless authentication to launch processes.
\item[\texttt{-plaintext}]\mbox{}\\
  don't encrypt the data on the wire.
\end{description}


\subsection{Extensions for the gforker Process Management Environment}
\label{sec:extensions-forker}
\texttt{gforker} is a process management system for starting
processes on a single machine, so called because the MPI processes are
simply \texttt{fork}ed from the \texttt{mpiexec} process.  This process
manager supports programs that use \texttt{MPI\_Comm\_spawn} and the other
dynamic process routines, but does not support the use of the dynamic process
routines from programs that are not started with \texttt{mpiexec}.  The
\texttt{gforker} process manager is primarily intended as a debugging aid as
it simplifies development and testing of MPI programs on a single node or
processor.  

\subsubsection{\texttt{mpiexec} arguments for gforker}
\label{sec:mpiexec-forker}

In addition to the standard \texttt{mpiexec} command-line arguments, the
\texttt{gforker} \texttt{mpiexec} supports the following options:
\begin{description}
\item[\texttt{-np <num>}]A synonym for the standard \texttt{-n} argument
\item[\texttt{-env <name> <value>}]Set the environment variable
\texttt{<name>} to \texttt{<value>} for the processes being run by
\texttt{mpiexec}.
\item[\texttt{-envnone}]Pass no environment variables (other than ones
specified with  other \texttt{-env} or \texttt{-genv} arguments) to the
processes being run by \texttt{mpiexec}. 
By default, all environment
variables are provided to each MPI process (rationale: principle of
least surprise for the user)
\item[\texttt{-envlist <list>}]Pass the listed environment variables (names
separated  by commas), with their current values, to the processes being run by
 \texttt{mpiexec}.
\item[\texttt{-genv <name> <value>}]The \item{-genv} options have the same
meaning as their corresponding \texttt{-env} version, except they apply to all
executables, not just the current executable (in the case that the colon
syntax is used to specify multiple execuables).
\item[\texttt{-genvnone}]Like \texttt{-envnone}, but for all executables
\item[\texttt{-genvlist <list>}]Like \texttt{-envlist}, but for all executables
\item[\texttt{-usize <n>}]Specify the value returned for the value of the
attribute \texttt{MPI\_UNIVERSE\_SIZE}.
\item[\texttt{-l}]Label standard out and standard error (\texttt{stdout} and \texttt{stderr}) with 
  the rank of the process
\item[\texttt{-maxtime <n>}]Set a timelimit of \texttt{<n>} seconds.
\item[\texttt{-exitinfo}]Provide more information on the reason each process
exited if there is an abnormal exit
\end{description}

In addition to the commandline argments, the \texttt{gforker} \texttt{mpiexec}
provides a number of environment variables that can be used to control the
behavior of \texttt{mpiexec}:

\begin{description}
\item[\texttt{MPIEXEC\_TIMEOUT}]Maximum running time in seconds.
\texttt{mpiexec} will terminate MPI programs that take longer than the value
specified by \texttt{MPIEXEC\_TIMEOUT}.  
\item[\texttt{MPIEXEC\_UNIVERSE\_SIZE}]Set the universe size
\item[\texttt{MPIEXEC\_PORT\_RANGE}]Set the range of ports that
\texttt{mpiexec} will use  
  in communicating with the processes that it starts.  The format of 
  this is \texttt{<low>:<high>}.  For example, to specify any port between
  10000 and 10100, use \texttt{10000:10100}.  
\item[\texttt{MPICH\_PORT\_RANGE}]Has the same meaning as
\texttt{MPIEXEC\_PORT\_RANGE} and is used if \texttt{MPIEXEC\_PORT\_RANGE} is
not set. 
\item[\texttt{MPIEXEC\_PREFIX\_DEFAULT}]If this environment variable is set,
output to standard output is prefixed by the rank in \texttt{MPI\_COMM\_WORLD}
of the process and output to standard error is prefixed by the rank and the
text \texttt{(err)}; both are followed by an angle bracket (\texttt{>}).  If 
  this variable is not set, there is no prefix.
\item[\texttt{MPIEXEC\_PREFIX\_STDOUT}]Set the prefix used for lines sent to
standard output.  A \texttt{\%d} is replaced with the rank in
\texttt{MPI\_COMM\_WORLD}; a \texttt{\%w} is replaced with an indication of
which \texttt{MPI\_COMM\_WORLD} in MPI jobs that involve multiple
\texttt{MPI\_COMM\_WORLD}s (e.g., ones that use \texttt{MPI\_Comm\_spawn} or
\texttt{MPI\_Comm\_connect}). 
\item[\texttt{MPIEXEC\_PREFIX\_STDERR}]Like \texttt{MPIEXEC\_PREFIX\_STDOUT},
but for standard error. 
\item[\texttt{MPIEXEC\_STDOUTBUF}]Sets the buffering mode for standard
  output.  Valid  values are \texttt{NONE} (no buffering),
  \texttt{LINE} (buffering by lines), and \texttt{BLOCK} (buffering by
  blocks of characters; the size of the block is implementation
  defined).  The default is \texttt{NONE}. 
\item[\texttt{MPIEXEC\_STDERRBUF}]Like \texttt{MPIEXEC\_STDOUTBUF},
  but for standard error. 
\end{description}

\subsection{Restrictions of the remshell Process Management Environment}
\label{sec:restrictions-remshell}

The \texttt{remshell} ``process manager'' provides a very simple version of
\texttt{mpiexec} that makes use of the secure shell command (\texttt{ssh}) to
start processes on a collection of machines.  As this is intended primarily as
an illustration of how to build a version of \texttt{mpiexec} that works with
other process managers, it does not implement all of the features of the other
\texttt{mpiexec} programs described in this document.  In particular, it
ignores the command line options that control the environment variables given
to the MPI programs.  It does support the same output labeling features
provided by the \texttt{gforker} version of \texttt{mpiexec}. 
However, this version of \texttt{mpiexec} can be used
much like the \texttt{mpirun} for the \texttt{ch\_p4} device in MPICH-1 to run
programs on a collection of machines that allow remote shells.  A file by the
name of \texttt{machines} should contain the names of machines on which
processes can be run, one machine name per line.  There must be enough
machines listed to satisfy the requested number of processes; you can list the
same machine name multiple times if necessary.  


\subsection{Using MPICH2 with SLURM and PBS}
\label{sec:external_pm}

There are multiple ways of using MPICH2 with SLURM or PBS. Hydra
provides native support for both SLURM and PBS, and is likely the
easiest way to use MPICH2 on these systems (see the Hydra
documentation above for more details).

Alternatively, SLURM also provides compatibility with MPICH2's
internal process management interface. To use this, you need to
configure MPICH2 with SLURM support, and then use the {\texttt srun}
job launching utility provided by SLURM.

For PBS, MPICH2 jobs can be launched in two ways: (i) use Hydra's
mpiexec with the appropriate options corresponding to PBS, or (ii)
using the OSC mpiexec.


\subsubsection{OSC mpiexec}
\label{sec:osc_mpiexec}

Pete Wyckoff from the Ohio Supercomputer Center provides a alternate
utility called OSC mpiexec to launch MPICH2 jobs on PBS systems. More
information about this can be found here:
\url{http://www.osc.edu/~pw/mpiexec}


\section{Debugging}
\label{sec:debugging}

Debugging parallel programs is notoriously difficult.  Here we describe
a number of approaches, some of which depend on the exact version of
MPICH2 you are using. 


\subsection{TotalView}
\label{sec:totalview}

MPICH2 supports use of the TotalView debugger from Etnus.  If MPICH2
has been configured to enable debugging with TotalView then one can
debug an MPI program using

\begin{verbatim}
    totalview -a mpiexec -a -n 3 cpi
\end{verbatim}

You will get a popup window from TotalView asking whether you want to
start the job in a stopped state.  If so, when the TotalView window
appears, you may see assembly code in the source window.  Click on
\texttt{main} in the stack window (upper left) to see the source of
the \texttt{main} function.  TotalView will show that the program (all
processes) are stopped in the call to \texttt{MPI\_Init}.

If you have TotalView 8.1.0 or later, you can use a TotalView feature
called indirect launch with MPICH2. Invoke TotalView as:

\begin{verbatim}
    totalview <program> -a <program args>
\end{verbatim}

Then select the Process/Startup Parameters command. Choose the
Parallel tab in the resulting dialog box and choose MPICH2 as the
parallel system. Then set the number of tasks using the Tasks field
and enter other needed mpiexec arguments into the Additional Starter
Arguments field.

\section{Checkpointing}
\label{sec:checkpointing}
MPICH2 supports checkpoint/rollback fault tolerance when used with the
Hydra process manager.  Currently only the BLCR checkpointing library
is supported.  BLCR needs to be installed separately.  Below we
describe how to enable the feature in MPICH2 and how to use it.  This
information can also be found on the MPICH Wiki:
\url{http://wiki.mcs.anl.gov/mpich2/index.php/Checkpointing}

\subsection{Configuring for checkpointing}
\label{sec:conf-checkp}

First, you need to have BLCR version 0.8.2 installed on your
machine.  If it's installed in the default system location, add the
following two options to your configure command:
\begin{small}
\begin{verbatim}
    --enable-checkpointing --with-hydra-ckpointlib=blcr
\end{verbatim}
\end{small}

If BLCR is not installed in the default system location, you'll need
to tell MPICH2's configure where to find it.  You might also need to
set the \texttt{LD\_LIBRARY\_PATH} environment variable so that BLCR's shared
libraries can be found.  In this case add the following options to your
configure command:
\begin{small}
\begin{verbatim}
    --enable-checkpointing --with-hydra-ckpointlib=blcr 
    --with-blcr=BLCR_INSTALL_DIR LD_LIBRARY_PATH=BLCR_INSTALL_DIR/lib
\end{verbatim}
\end{small}
where \texttt{BLCR\_INSTALL\_DIR} is the directory where BLCR has been
installed (whatever was specified in \texttt{--prefix} when BLCR was
configured).  Note, checkpointing is only supported with the Hydra
process manager.  Hyrda will used by default, unless you choose
something else with the \texttt{--with-pm=} configure option.

After it's configured, compile as usual (e.g., \texttt{make; make install}). 

\subsection{Taking checkpoints}
\label{sec:taking-checkpoints}

To use checkpointing, include the \texttt{-ckpointlib} option for
\texttt{mpiexec} to specify the checkpointing library to use and
\texttt{-ckpoint-prefix} to specify the directory where the checkpoint
images should be written:
\begin{small}
\begin{verbatim}
    shell$ mpiexec -ckpointlib blcr \
           -ckpoint-prefix /home/buntinas/ckpts/app.ckpoint \
           -f hosts -n 4 ./app
\end{verbatim}
\end{small}

While the application is running, the user can request for a
checkpoint at any time by sending a \texttt{SIGUSR1} signal to
\texttt{mpiexec}.  You can also automatically checkpoint the
application at regular intervals using the mpiexec option
\texttt{-ckpoint-interval} to specify the number of seconds between
checkpoints:
\begin{small}
\begin{verbatim}
    shell$ mpiexec -ckpointlib blcr \
           -ckpoint-prefix /home/buntinas/ckpts/app.ckpoint \
           -ckpoint-interval 3600 -f hosts -n 4 ./app
\end{verbatim}
\end{small}

The checkpoint/restart parameters can also be controlled with the
environment variables \texttt{HYDRA\_\linebreak[0]CKPOINTLIB},
\texttt{HYDRA\_\linebreak[0]CKPOINT\_\linebreak[0]PREFIX} and
\texttt{HYDRA\_\linebreak[0]CKPOINT\_\linebreak[0]INTERVAL}.

Each checkpoint generates one file per node.  Note that checkpoints
for all processes on a node will be stored in the same file.  Each
time a new checkpoint is taken an additional set of files are created.
The files are numbered by the checkpoint number.  This allows the
application to be restarted from checkpoints other than the most
recent.  The checkpoint number can be specified with the
\texttt{-ckpoint-num} parameter.  To restart a process:
\begin{small}
\begin{verbatim}
    shell$ mpiexec -ckpointlib blcr \
           -ckpoint-prefix /home/buntinas/ckpts/app.ckpoint \
           -ckpoint-num 5 -f hosts -n 4
\end{verbatim}
\end{small}

Note that by default, the process will be restarted from the first
checkpoint, so in most cases, the checkpoint number should be
specified.

\section{MPE}
\label{sec:mpe}
MPICH2 comes with the same MPE (Multi-Processing Environment) tools that are
included with MPICH1.  These include several trace libraries for recording the
execution of MPI programs and the Jumpshot and SLOG tools for performance
visualization, and a MPI collective and datatype checking library.
The MPE tools are built and installed by default and should be available
without requiring any additional steps.  The easiest way to use MPE profiling
libraries is through the \texttt{-mpe=} switch provided by MPICH2's
compiler wrappers,
\texttt{mpicc}, \texttt{mpicxx}, \texttt{mpif77}, and \texttt{mpif90}.

\subsection{MPI Logging}
\label{sec:mpe_mpilog}
MPE provides automatic MPI logging.
For instance, to view MPI communication pattern of a program, fpilog.f,
one can simply link the source file as follows:
\begin{verbatim}
mpif90 -mpe=mpilog -o fpilog fpilog.f
\end{verbatim}
The \texttt{-mpe=mpilog} option will link with appropriate MPE profiling
libraries.  Then running the program through \texttt{mpiexec}
will result a logfile, \texttt{Unknown.clog2}, in the working directory.
The final step is to convert and view the logfile through Jumpshot:
\begin{verbatim}
jumpshot Unknown.clog2
\end{verbatim}

\subsection{User-defined logging}
\label{sec:mpe_userlog}
In addition to using the predefined MPE logging to log MPI calls,
MPE logging calls can be inserted into user's MPI program to define
and log states.  These states are called User-Defined states.  States may
be nested, allowing one to define a state describing a user routine that
contains several MPI calls, and display both the user-defined state and
the MPI operations contained within it.

The typical way to insert user-defined states is as follows:
\begin{itemize}
\item{Get handles from MPE logging library:
      \texttt{MPE\_Log\_get\_state\_eventIDs()}
      has to be used to get unique event IDs (MPE logging handles).
        \footnote{Older MPE libraries provide
        \texttt{MPE\_Log\_get\_event\_number()} which is still being
        supported but has been deprecated.  Users are strongly urged
        to use \texttt{MPE\_Log\_get\_state\_eventIDs()} instead.}
      This is important if you are writing a library that uses
      the MPE logging routines from the MPE system.  Hardwiring the
      eventIDs is considered a bad idea since it may cause eventID
      confict and so the practice isn't supported.}
\item{Set the logged state's characteristics: \texttt{MPE\_Describe\_state()}
      sets the name and color of the states.}
\item{Log the events of the logged states: \texttt{MPE\_Log\_event()}
      are called twice to log the user-defined states.}
\end{itemize}
Below is a simple example that uses the 3 steps outlined above.
\begin{samepage}
\begin{verbatim}
int eventID_begin, eventID_end;
...
MPE_Log_get_state_eventIDs( &eventID_begin, &eventID_end );
...
MPE_Describe_state( eventID_begin, eventID_end,
                    "Multiplication", "red" );
...
MyAmult( Matrix m, Vector v )
{
    /* Log the start event of the red "Multiplication" state */
    MPE_Log_event( eventID_begin, 0, NULL );
    ... Amult code, including MPI calls ...
    /* Log the end event of the red "Multiplication" state */
    MPE_Log_event( eventID_end, 0, NULL );
}
\end{verbatim}
\end{samepage}
The logfile generated by this code will have the MPI routines nested within
the routine MyAmult().

Besides user-defined states, MPE2 also provides support for user-defined
events which can be defined through use of
\texttt{MPE\_Log\_get\_solo\_eventID()} and \texttt{MPE\_Describe\_event()}.
For more details, e.g. see cpilog.c.

\subsection{MPI Checking}
\label{sec:mpe_mpicheck}
To validate all the MPI collective calls in a program by
linking the source file as follows:
\begin{verbatim}
mpif90 -mpe=mpicheck -o wrong_reals wrong_reals.f
\end{verbatim}
Running the program will result with the following output:
\begin{verbatim}
> mpiexec -n 4 wrong_reals
Starting MPI Collective and Datatype Checking!
 Process            3  of            4  is alive
Backtrace of the callstack at rank 3:
        At [0]: wrong_reals(CollChk_err_han+0xb9)[0x8055a09]
        At [1]: wrong_reals(CollChk_dtype_scatter+0xbf)[0x8057bff]
        At [2]: wrong_reals(CollChk_dtype_bcast+0x3d)[0x8057ccd]
        At [3]: wrong_reals(MPI_Bcast+0x6c)[0x80554bc]
        At [4]: wrong_reals(mpi_bcast_+0x35)[0x80529b5]
        At [5]: wrong_reals(MAIN__+0x17b)[0x805264f]
        At [6]: wrong_reals(main+0x27)[0x80dd187]
        At [7]: /lib/libc.so.6(__libc_start_main+0xdc)[0x9a34e4]
        At [8]: wrong_reals[0x8052451]
[cli_3]: aborting job:
Fatal error in MPI_Comm_call_errhandler:

Collective Checking: BCAST (Rank 3) --> Inconsistent datatype signatures
                                        detected between rank 3 and rank 0.
\end{verbatim}
The error message here shows that the \texttt{MPI\_Bcast} has been used with
inconsistent datatype in the program wrong\_reals.f.

\subsection{MPE options}
\label{sec:mpe_options}
Other MPE profiling options that are available through MPICH2 compiler
wrappers are 
\begin{verbatim}
    -mpe=mpilog     : Automatic MPI and MPE user-defined states logging.
                      This links against -llmpe -lmpe.

    -mpe=mpitrace   : Trace MPI program with printf.
                      This links against -ltmpe.

    -mpe=mpianim    : Animate MPI program in real-time.
                      This links against -lampe -lmpe.

    -mpe=mpicheck   : Check MPI Program with the Collective & Datatype
                      Checking library.  This links against -lmpe_collchk.

    -mpe=graphics   : Use MPE graphics routines with X11 library.
                      This links against -lmpe <X11 libraries>.

    -mpe=log        : MPE user-defined states logging.
                      This links against -lmpe.

    -mpe=nolog      : Nullify MPE user-defined states logging.
                      This links against -lmpe_null.

    -mpe=help       : Print the help page.
\end{verbatim}
For more details of how to use MPE profiling tools, see
\texttt{mpich2/src/mpe2/README}.


\section{Other Tools Provided with MPICH2}
\label{sec:other-tools}
MPICH2 also includes a test suite for MPI-1 and MPI-2 functionality; this
suite may be found in the \texttt{mpich2/test/mpi} source directory and can be
run with the command \texttt{make testing}.  This test suite should work with
any MPI implementation, not just MPICH2.

\section{MPICH2 under Windows}
\label{sec:windows}

\subsection{Directories}
\label{sec:windir}

The default installation of MPICH2 is in
\texttt{C:$\backslash$Program Files$\backslash$MPICH2}. Under the installation
directory are three sub-directories: \texttt{include}, \texttt{bin}, and
\texttt{lib}.  The \texttt{include} and \texttt{lib} directories contain
the header files and libraries necessary to compile MPI applications.  
The \texttt{bin} directory contains the process manager, \texttt{smpd.exe},
and the MPI job launcher, \texttt{mpiexec.exe}.  The dlls that implement
MPICH2 are copied to the Windows system32 directory.

\subsection{Compiling}
\label{sec:wincompile}

The libraries in the \texttt{lib} directory were compiled with MS Visual C++ .NET 2003
and Intel Fortran 8.1.  These 
compilers and any others that can link with the MS \texttt{.lib} files can be used to
create user applications.  \texttt{gcc} and \texttt{g77} for \texttt{cygwin} can be used with the 
\texttt{libmpich*.a} libraries.

For MS Developer Studio users: Create a project and add
\begin{verbatim}
    C:\Program Files\MPICH2\include
\end{verbatim}
to the include path and
\begin{verbatim}
    C:\Program Files\MPICH2\lib
\end{verbatim}
to
the library path.  Add \texttt{mpi.lib} and \texttt{cxx.lib} to the
link command.  Add \texttt{cxxd.lib} to the Debug target link instead of
\texttt{cxx.lib}.

Intel Fortran 8 users should add \texttt{fmpich2.lib} to the link command. 

Cygwin users should use \texttt{libmpich2.a} \texttt{libfmpich2g.a}.

\subsection{Running}
\label{sec:winrun}

MPI jobs are run from a command prompt using \texttt{mpiexec.exe}.  See
Section~\ref{sec:extensions-smpd} on \texttt{mpiexec} for \texttt{smpd}
for a description of the options to \texttt{mpiexec}.

\clearpage
\appendix

\section{Frequently Asked Questions}

The frequently asked questions are maintained online
here:\url{http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions}

\bibliographystyle{plain}
\bibliography{user}

\end{document}


